306 research outputs found
RePOR: Mimicking humans on refactoring tasks. Are we there yet?
Refactoring is a maintenance activity that aims to improve design quality
while preserving the behavior of a system. Several (semi)automated approaches
have been proposed to support developers in this maintenance activity, based on
the correction of anti-patterns, which are `poor' solutions to recurring design
problems. However, little quantitative evidence exists about the impact of
automatically refactored code on program comprehension, and in which context
automated refactoring can be as effective as manual refactoring. Leveraging
RePOR, an automated refactoring approach based on partial order reduction
techniques, we performed an empirical study to investigate whether automated
refactoring code structure affects the understandability of systems during
comprehension tasks. (1) We surveyed 80 developers, asking them to identify
from a set of 20 refactoring changes if they were generated by developers or by
a tool, and to rate the refactoring changes according to their design quality;
(2) we asked 30 developers to complete code comprehension tasks on 10 systems
that were refactored by either a freelancer or an automated refactoring tool.
To make comparison fair, for a subset of refactoring actions that introduce new
code entities, only synthetic identifiers were presented to practitioners. We
measured developers' performance using the NASA task load index for their
effort, the time that they spent performing the tasks, and their percentages of
correct answers. Our findings, despite current technology limitations, show
that it is reasonable to expect a refactoring tools to match developer code
Stack Overflow: A Code Laundering Platform?
Developers use Question and Answer (Q&A) websites to exchange knowledge and
expertise. Stack Overflow is a popular Q&A website where developers discuss
coding problems and share code examples. Although all Stack Overflow posts are
free to access, code examples on Stack Overflow are governed by the Creative
Commons Attribute-ShareAlike 3.0 Unported license that developers should obey
when reusing code from Stack Overflow or posting code to Stack Overflow. In
this paper, we conduct a case study with 399 Android apps, to investigate
whether developers respect license terms when reusing code from Stack Overflow
posts (and the other way around). We found 232 code snippets in 62 Android apps
from our dataset that were potentially reused from Stack Overflow, and 1,226
Stack Overflow posts containing code examples that are clones of code released
in 68 Android apps, suggesting that developers may have copied the code of
these apps to answer Stack Overflow questions. We investigated the licenses of
these pieces of code and observed 1,279 cases of potential license violations
(related to code posting to Stack overflow or code reuse from Stack overflow).
This paper aims to raise the awareness of the software engineering community
about potential unethical code reuse activities taking place on Q&A websites
like Stack Overflow.Comment: In proceedings of the 24th IEEE International Conference on Software
Analysis, Evolution, and Reengineering (SANER
Comprehension of Ads-supported and Paid Android Applications: Are They Different?
The Android market is a place where developers offer paid and-or free apps to
users. Free apps are interesting to users because they can try them immediately
without incurring a monetary cost. However, free apps often have limited
features and-or contain ads when compared to their paid counterparts. Thus,
users may eventually need to pay to get additional features and-or remove ads.
While paid apps have clear market values, their ads-supported versions are not
entirely free because ads have an impact on performance.
In this paper, first, we perform an exploratory study about ads-supported and
paid apps to understand their differences in terms of implementation and
development process. We analyze 40 Android apps and we observe that (i)
ads-supported apps are preferred by users although paid apps have a better
rating, (ii) developers do not usually offer a paid app without a corresponding
free version, (iii) ads-supported apps usually have more releases and are
released more often than their corresponding paid versions, (iv) there is no a
clear strategy about the way developers set prices of paid apps, (v) paid apps
do not usually include more functionalities than their corresponding
ads-supported versions, (vi) developers do not always remove ad networks in
paid versions of their ads-supported apps, and (vii) paid apps require less
permissions than ads-supported apps. Second, we carry out an experimental study
to compare the performance of ads-supported and paid apps and we propose four
equations to estimate the cost of ads-supported apps. We obtain that (i)
ads-supported apps use more resources than their corresponding paid versions
with statistically significant differences and (ii) paid apps could be
considered a most cost-effective choice for users because their cost can be
amortized in a short period of time, depending on their usage.Comment: Accepted for publication in the proceedings of the IEEE International
Conference on Program Comprehension 201
Grand Challenges of Traceability: The Next Ten Years
In 2007, the software and systems traceability community met at the first
Natural Bridge symposium on the Grand Challenges of Traceability to establish
and address research goals for achieving effective, trustworthy, and ubiquitous
traceability. Ten years later, in 2017, the community came together to evaluate
a decade of progress towards achieving these goals. These proceedings document
some of that progress. They include a series of short position papers,
representing current work in the community organized across four process axes
of traceability practice. The sessions covered topics from Trace Strategizing,
Trace Link Creation and Evolution, Trace Link Usage, real-world applications of
Traceability, and Traceability Datasets and benchmarks. Two breakout groups
focused on the importance of creating and sharing traceability datasets within
the research community, and discussed challenges related to the adoption of
tracing techniques in industrial practice. Members of the research community
are engaged in many active, ongoing, and impactful research projects. Our hope
is that ten years from now we will be able to look back at a productive decade
of research and claim that we have achieved the overarching Grand Challenge of
Traceability, which seeks for traceability to be always present, built into the
engineering process, and for it to have "effectively disappeared without a
trace". We hope that others will see the potential that traceability has for
empowering software and systems engineers to develop higher-quality products at
increasing levels of complexity and scale, and that they will join the active
community of Software and Systems traceability researchers as we move forward
into the next decade of research
Documentation of Machine Learning Software
Machine Learning software documentation is different from most of the
documentations that were studied in software engineering research. Often, the
users of these documentations are not software experts. The increasing interest
in using data science and in particular, machine learning in different fields
attracted scientists and engineers with various levels of knowledge about
programming and software engineering. Our ultimate goal is automated generation
and adaptation of machine learning software documents for users with different
levels of expertise. We are interested in understanding the nature and triggers
of the problems and the impact of the users' levels of expertise in the process
of documentation evolution. We will investigate the Stack Overflow Q/As and
classify the documentation related Q/As within the machine learning domain to
understand the types and triggers of the problems as well as the potential
change requests to the documentation. We intend to use the results for building
on top of the state of the art techniques for automatic documentation
generation and extending on the adoption, summarization, and explanation of
software functionalities.Comment: The paper is accepted for publication in 27th IEEE International
Conference on Software Analysis, Evolution and Reengineering (SANER 2020
Comparison and Evaluation of Clone Detection Tools
Many techniques for detecting duplicated source code (software clones) have been proposed in the past. However, it is not yet clear how these techniques compare in terms of recall and precision as well as space and time requirements. This paper presents an experiment that evaluates six clone detectors based on eight large C and Java programs (altogether almost 850 KLOC). Their clone candidates were evaluated by one of the authors as an independent third party. The selected techniques cover the whole spectrum of the state-of-the-art in clone detection. The techniques work on text, lexical and syntactic information, software metrics, and program dependency graphs
Insider threat resistant SQL-injection prevention in PHP
Web sites are either static sites, programs, or
databases. Very often they are a mixture of these three
aspects integrating relational databases as a back-end.
Web sites require configuration and programming attention
to assure security, confidentiality, and trustiness
of the published information.
SQL-injection attacks rely on some weak validation
of textual input used to build database queries. Maliciously
crafted input may threaten the confidentiality
and the security policies of Web sites relying on
a database to store and retrieve information.
Furthermore, insiders may introduce malicious code
in a Web application, code that, when triggered by some
specific input, for example, would violate security policies.
This paper presents an original approach that combines
static analysis, dynamic analysis, and code reengineering
to automatically protect applications written
in PHP from both malicious input (outsider threats)
and malicious code (insider threats) that carry SQLinjection
attacks.
The paper also reports preliminary results about experiments
performed on an old SQL-injection prone
version of phpBB (version 2.0.0, 37193 LOC of PHP
version 4.2.2 code). Results show that our approach
successfully improved phpBB-2.0.0 resistance to SQLinjection
attacks
A Google-inspired error-correcting graph matching algorithm
Graphs and graph algorithms are applied in many different areas including
civil engineering, telecommunications, bio-informatics and software engineering.
While exact graph matching is grounded on a consolidated theory and
has well known results, approximate graph matching is still an open research
subject.
This paper presents an error tolerant approximated graph matching algorithm
based on tabu search using the Google-like PageRank algorithm. We report preliminary
results obtained on 2 graph data benchmarks. The first one is the TC-15
database [14], a graph data base at the University of Naples, Italy. These graphs
are limited to exact matching. The second one is a novel data set of large graphs
generated by randomly mutating TC-15 graphs in order to evaluate the performance
of our algorithm. Such a mutation approach allows us to gain insight not
only about time but also about matching accuracy
A feedback based quality assessment to support open source software evolution: the GRASS case study
Abstrac
The Effect of Communication Overhead on Software Maintenance Project Staffing: a Search-Based Approach
Brooks ’ milestone ‘Mythical Man Month ’ established the observation that there is no simple conversion between peo-ple and time in large scale software projects. Communica-tion and training overheads yield a subtle and variable re-lationship between the person-months required for a project and the number of people needed to complete the task within a given time frame. This paper formalises several instantiations of Brooks’ law and uses these to construct project schedule and staffing instances — using a search-based project staffing and scheduling approach — on data from two large real world maintenance projects. The results reveal the impact of dif-ferent formulations of Brooks ’ law on project completion time and on staff distribution across teams, and the influ-ence of other factors such as the presence of dependen-cies between work packages on the effect of communication overhead
- …